Chapter 18
A Yes-or-No Proposition: Logistic
Regression
IN THIS CHAPTER
Figuring out when to use logistic regression
Getting a grip on the basics of logistic regression
Running a logistic regression model and making sense of the output
Watching for common issues with logistic regression
Estimating the sample size you need for logistic regression
You can use logistic regression to analyze the relationship between one or more predictor variables
(the X variables) and a categorical outcome variable (the Y variable). Typical categorical outcomes
include the following two-level variables (which are also called binary or dichotomous):
Lived or died by a certain date
Did or didn’t get diagnosed with Type II diabetes
Responded or didn’t respond to a treatment
Did or did not choose a particular health insurance plan
In this chapter, we explain logistic regression. We describe the circumstances under which to use it,
the important related concepts, how to execute it with software, and how to interpret the output. We
also point out the pitfalls with logistic regression and show you how to determine the sample sizes you
need to execute such a model.
Using Logistic Regression
Following are typical uses of logistic regression analysis:
To test whether one or more predictors and an outcome are statistically significantly associated.
For example, to test whether age and/or obesity status are associated with increased likelihood to
be diagnosed with Type II diabetes.
To overcome the limitations of the 2x2 cross-tab method (described in Chapter 12), which can
analyze only one predictor at a time (and the predictor has to be binary). With logistic regression,
you can analyze multiple predictor variables at a time. Each predictor can be a numeric variable or
a categorical variable having two or more levels.
To quantify the extent or magnitude of an association between a particular predictor and an
outcome that have been established to have an association. In other words, you are seeking to